General purpose computer-assisted clustering and conceptualization.

نویسندگان

  • Justin Grimmer
  • Gary King
چکیده

We develop a computer-assisted method for the discovery of insightful conceptualizations, in the form of clusterings (i.e., partitions) of input objects. Each of the numerous fully automated methods of cluster analysis proposed in statistics, computer science, and biology optimize a different objective function. Almost all are well defined, but how to determine before the fact which one, if any, will partition a given set of objects in an "insightful" or "useful" way for a given user is unknown and difficult, if not logically impossible. We develop a metric space of partitions from all existing cluster analysis methods applied to a given dataset (along with millions of other solutions we add based on combinations of existing clusterings) and enable a user to explore and interact with it and quickly reveal or prompt useful or insightful conceptualizations. In addition, although it is uncommon to do so in unsupervised learning problems, we offer and implement evaluation designs that make our computer-assisted approach vulnerable to being proven suboptimal in specific data types. We demonstrate that our approach facilitates more efficient and insightful discovery of useful information than expert human coders or many existing fully automated methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble

An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The inp...

متن کامل

Investigating the Problems and Needs of Infertile Patients Referring to Assisted Reproduction Centers: A Review Study

Background: The provision of optimal care is the most important goal in nursing, the fulfillment of which requires the identification of clients’ problems and needs. However, based on the review of the literature, no review study has investigated the problems and needs of the infertile patients in Iran. Aim: The purpose of the present study was to investigate the problems and needs of the infer...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Comparison of Three Instructional Methods for Drug Calculation Skill in Nursing Critical Care Courses: Lecturing, Problem Solving, and Computer-Assisted Self-Learning

Introduction: Due to development of educational systems and importance of education in the nursing profession, the necessity of using appropriate instructional methods for new theoretical and practical skills in students is clear. The purpose of this study is comparing the effects of three methods lecture, problem solving, and computer-assisted self learning on the drug calculation skill on thi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 108 7  شماره 

صفحات  -

تاریخ انتشار 2011